Back

Molecular Ecology Resources

Wiley

Preprints posted in the last 90 days, ranked by how well they match Molecular Ecology Resources's content profile, based on 161 papers previously published here. The average preprint has a 0.06% match score for this journal, so anything above that is already an above-average fit.

1
Accuracy of occurrence and abundance estimates from insect metabarcoding

Iwaszkiewicz-Eggebrecht, E.; Granqvist, E.; Nowak, K. H.; Valdivia, C.; Buczek, M.; Srivathsan, A.; Hartop, E.; Miraldo, A.; Roslin, T.; Tack, A. J. M.; Lukasik, P.; Meier, R.; Ronquist, F.

2026-02-22 zoology 10.64898/2026.02.20.707016 medRxiv
Top 0.1%
59.0%
Show abstract

1. DNA metabarcoding--high-throughput sequencing of barcode regions from bulk samples--has become a key tool for insect biodiversity assessment. Yet, how methodological choices affect the accuracy of metabarcoding data remains insufficiently explored. In this paper, we ask: (1) How does the lysis method (non-destructive lysis vs. destructive homogenization) affect community recovery? (2) How comprehensively does metabarcoding capture species richness? (3) To what extent can spike-ins improve abundance estimates? (4) How accurately can species abundances be estimated? 2. We evaluated the accuracy of insect metabarcoding using 4,749 bulk samples from a large-scale biodiversity survey subjected to mild lysis. Of these samples, 856 were also homogenized, allowing a systematic comparison of the effect of alternative treatments. To potentially improve abundance estimates, we added six biological spike-ins (i.e., foreign insects) to all samples, and two synthetic spike-ins (artificial DNA fragments) to the homogenization treatment. In addition, we established the contents of 15 samples by individually barcoding all specimens, enabling direct assessment of occurrence and abundance estimates. 3. Our results revealed consistent differences between destructive and non-destructive treatments. While both methods reliably detected the majority of species, small and soft-bodied taxa were more often recovered after mild lysis than after homogenization, while the reverse was true for heavily sclerotized, hairy, and large taxa. Using biological spike-ins for calibration reduced the variance in read numbers per specimen considerably, especially in homogenized samples, while synthetic spike-ins were less effective. In a Bayesian analysis, where species data were matched to the best-fitting spike-in calibration curve, accurate abundance estimates (+/-1 individual) were obtained for 72.9% of species occurrences. 4. Our results show that it is possible to obtain reasonably accurate abundance estimates from metabarcoding data, and that mild lysis and homogenization result in different taxon-specific biases in terms of occurrence data, with neither method outperforming the other. Accuracy is improved by homogenization rather than mild lysis of samples, and by the use of biological rather than synthetic spike-ins. Together, these findings provide a major step towards robust, quantitative biodiversity monitoring using DNA-metabarcoding.

2
A Novel eDNA-Based Approach for Hybrid Detection: Implications for Conservation Management

Sakata, M. K.; Yano, N.; Imamura, A.; Yamanaka, H.; Minamoto, T.

2026-03-27 ecology 10.64898/2026.03.26.714632 medRxiv
Top 0.1%
34.9%
Show abstract

Hybridization between invasive and native species poses a hidden but critical threat to biodiversity. While environmental DNA (eDNA) has revolutionized species monitoring, it has lacked the resolution to detect hybrid individuals. Here, we present the first experimental demonstration of hybrid identification using eDNA. Our method isolates a single cell in the environment (hereafter, eCell) and enables cellular-level analysis using multiplex digital PCR targeting nuclear markers from both parental species. Validation with controlled tank experiments using Oncorhynchus masou masou x Salvelinus leucomaenis leucomaenis hybrid individuals confirmed the methods ability to separately detect hybrid individuals from co-habiting purebred parent individuals. This eCell analysis overcomes the limitations of traditional eDNA methods and offers a scalable, non-invasive tool for detecting cryptic hybridization. By enabling early and accurate detection of hybrid individuals, it supports timely conservation decisions, including management prioritization and the protection of purebred populations. This novel technique bridges a critical gap in conservation genetics and enhances eDNAs utility for biodiversity management in the face of global change.

3
Protocol for genotyping cephalopod sex using a skin swab and quantitative PCR

Montague, T. G.; Rubino, F. A.; Gibbons, C. J.; Mungioli, T. J.; Small, S. T.; Coffing, G. C.; Kern, A. D.

2026-04-02 molecular biology 10.64898/2026.03.31.715692 medRxiv
Top 0.1%
28.2%
Show abstract

The coleoid cephalopods (octopus, cuttlefish, and squid) are emerging model organisms for neuroscience, development, and evolutionary biology. Determining their sex early in life is critical for population management and controlled experiments. Here, we present a protocol to non-invasively determine the sex of multiple cephalopod species as young as 3 hours post-hatching using a skin swab and quantitative PCR (qPCR). We describe steps for designing qPCR primers, swabbing live animals, extracting DNA, running the qPCR, and analyzing the results. For complete details on the use and execution of this protocol, please refer to Rubino et al.1 HighlightsO_LISwab live cephalopods as early as 3 hours post-hatching C_LIO_LIExtract DNA from cephalopod skin swabs C_LIO_LIPerform qPCR-based sex determination C_LIO_LIDesign and validate qPCR primers for new species C_LI Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=190 SRC="FIGDIR/small/715692v1_ufig1.gif" ALT="Figure 1"> View larger version (43K): org.highwire.dtl.DTLVardef@3aa68dorg.highwire.dtl.DTLVardef@8c7e61org.highwire.dtl.DTLVardef@1bd45d9org.highwire.dtl.DTLVardef@134cc4d_HPS_FORMAT_FIGEXP M_FIG C_FIG

4
Resampling-based validation of a SNP panel for hybrid detection across generations: a case study in European lobster

Roed, E. S.; Ellis, C.; Stevens, J.; Chavarie, L.; Saitou, M.

2026-01-23 genomics 10.64898/2026.01.20.700654 medRxiv
Top 0.1%
26.2%
Show abstract

Accurate detection of hybridization and introgression is critical for both evolutionary research and applied conservation. In many systems, however, hybrid ancestry is difficult to detect beyond the F1 generation, especially when based on limited genetic markers. In European waters, hybridization between the native Homarus gammarus and the invasive H. americanus poses a direct risk to the integrity of native stocks and effective fishery management, yet detection methods are often limited to morphological traits or first-generation hybrids. A set of 79 SNPs previously developed to distinguish species between American and European lobsters and F1 individuals has shown promise, but its capacity to resolve later-generation backcrosses remains untested. Here, we present a resampling-based evaluation of this panels performance under realistic introgression scenarios, using individual-based population genetic models informed by empirical data. We show that the panel retains discriminatory power across multiple hybrid classes, with diminishing accuracy in second-generation backcrosses. These findings validate the panels utility for applied monitoring and highlight the broader potential of resumpling-anchored frameworks to benchmark hybrid detection tools in a wide range of species. Article summaryThis study tests how well a reduced panel of genetic markers can detect hybridization across multiple generations. Using empirical genetic data of a 79-SNP panel from European and American lobsters, the authors generated individuals with known ancestry proportions through a resampling framework that preserves observed genetic variation. These data were analysed using model-based genetic assignment and ordination. The results show that the marker panel reliably identifies pure species and first-generation hybrids, but has reduced power to distinguish later backcross generations, mainly between adjacent hybrid classes. The study provides a practical benchmark for evaluating reduced marker panels used in applied monitoring and conservation genetics.

5
Enhancing Detection of Polygenic Adaptation: A Comparative Study of Machine Learning and Statistical Approaches Using Simulated Evolve-and-Resequence Data

Caliendo, C.; Gerber, S.; Pfenninger, M.

2026-02-24 genetics 10.1101/2024.11.28.625827 medRxiv
Top 0.1%
23.0%
Show abstract

Detecting signals of polygenic adaptation remains a significant challenge in population genomics, as traditional methods often struggle to identify the associated subtle, multi-locus allele-frequency shifts. Here, we introduced and tested several novel approaches combining machine learning techniques with traditional statistical tests to detect polygenic adaptation patterns in time-series of allele frequency changes from whole genome data. We implemented a Naive Bayesian Classifier (NBC) and One-Class Support Vector Machines (OCSVM), and compared their performance against the classical Fishers Exact Test (FET). Furthermore, we combined machine learning and statistical models (OCSVM-FET and NBC-FET), resulting in 5 competing approaches. Using a simulated data set based on empirical evolve-and-resequencing Chironomus riparius genomic data, we evaluated methods across evolutionary scenarios, varying in generations, selection strength and numbers of loci under selection. Our results demonstrate that the combined OCSVM-FET approach consistently outperformed competing methods, achieving the lowest false positive rate, highest area under the curve, and high accuracy. The performance peak aligned with what we term the late dynamic phase of adaptation--the period after initial selection has occurred but before fixation--highlighting the methods sensitivity to ongoing selective processes and thus its value for experimental approaches. Furthermore, we emphasize the critical role of parameter tuning, balancing biological assumptions with methodological rigor. Our approach offers a powerful tool for detecting polygenic adaptation from time series, e.g. pool sequencing data from evolve-and-resequence experiments.

6
A dual-clam species 63K SNP array for sustainable production and conservation of wild resources

Gallo, M.; Babucci, M.; Fernandez, S.; Bean, T.; Dalla Rovere, G.; Smits, M.; Penaloza, C.; Houston, R.; Woolley, S.; Cicala, F.; Franch, R.; Ferraresso, S.; Nai, I.; Patarnello, T.; Omele, A. E.; Blanco, A.; Sambade, I.; Martinez, P.; Bargelloni, L.; Milan, M.; Peruzza, L.

2026-02-06 genomics 10.64898/2026.02.06.704329 medRxiv
Top 0.1%
22.8%
Show abstract

Bivalves play an essential role in coastal ecosystems and their aquaculture represents an important economic sector in Europe playing a pivotal role within the EU Blue Growth Strategy. Among clam species, the Manila clam, Ruditapes philippinarum, and the grooved carpet shell, R. decussatus, are within the top five species in terms of production volume and economic value. In this study we designed and validated the first medium-density 63K single nucleotide polymorphism (SNP) array for these two commercially important species. By leveraging a new chromosome-level genome assembly for R. philippinarum and that of R. decussatus, we identified over 300 million SNPs through whole-genome resequencing and genotyping-by-sequencing strategies. After stringent filtering, we selected 49,392 high-quality SNPs for R. philippinarum and 14,193 for R. decussatus to construct a dual-species array. Array validation was carried out by genotyping 384 individuals across multiple wild populations and hatchery samples, demonstrating excellent performance, with 67.7% and 67.5% of SNPs classified as high-quality polymorphic markers for R. philippinarum and R. decussatus, respectively. Minor allele frequency, missing data rate, and inter-marker distance met stringent quality thresholds, confirming the array robustness for clam population genetics. Parentage analysis in R. philippinarum families highlighted significant power for pedigree reconstruction in breeding programs. This publicly available genomic resource provides a reliable, cost-effective genotyping platform to enable population genomics and advanced selective breeding, genome-wide association studies, and genetic monitoring, ultimately strengthening management of genetic diversity and sustainable farming for two key clam species.

7
Is DNA metabarcoding an option for formaldehyde-preserved zooplankton time series?

Albaina, A.; Lanzen, A.; Miguel, I.; Rendo, F.; Santos, M.

2026-02-09 zoology 10.64898/2026.02.06.704415 medRxiv
Top 0.1%
22.8%
Show abstract

The recovery of amplifiable DNA from formaldehyde{square}fixed (FF) zooplankton samples has long been considered unfeasible. Nevertheless, advances in DNA sequencing and methods for retrieving highly degraded genetic material have demonstrated that even million{square}year{square}old samples and FF museum specimens can yield usable DNA. To access the biological information preserved in long{square}term zooplankton time series, we assessed methodologies for extracting amplifiable DNA from community samples stored for up to 28 years in formaldehyde at room temperature. On one hand, we report the failure of a method previously described as successful for FF zooplankton samples, likely due to the cold{square}storage conditions (4{square}{degrees}C) used in the original study. On the other hand, by adapting two extraction protocols designed for FF museum specimens--representing harsher and softer alternatives (HHA and HPC, respectively)--we successfully amplified and sequenced a subset of FF zooplankton samples. As expected, DNA integrity and sample pH were inversely related to preservation time, and only short DNA fragments were recovered, ruling out the use of commonly employed [≥]300{square}bp metabarcoding markers. While DNA integrity appeared to be a better predictor than DNA yield for amplification success, the presence of a gel band of the expected size did not always guarantee congruence with microscopy{square}based assessments. Although amplifiable DNA was recovered from most samples, including some of the oldest, community compositions concordant with microscopy were consistently recovered only from samples preserved for up to two years. Beyond this point, the HHA and HPC methods produced divergent results, reflecting a trade{square}off between the removal of formaldehyde{square}induced cross{square}linkages and the avoidance of additional DNA damage. Among the small universal markers tested ([~]120-170{square}bp), including one nuclear rRNA marker and two mitochondrial markers, only the 18S rRNA V9 region consistently amplified. We conclude by providing a set of recommendations aimed at improving the methods presented here.

8
On-site metabarcoding analysis of environmental DNA samples

Mauvisseau, Q.; Ewer, I.; Blumeris, I.; Iren Bongo, S.; Filipe Brito de Oliveira, L.; Gouvea, B.; Carolina Cei, A.; Ferreira Rodrigues, K.; de Arruda Francisco, J.; Sletteng Garvang, E.; Marena do Rego Henriques, V.; Hurtado Solano, S.; Kvalheim, L.; Kaylynne Lawrence, S.; Ramalho Maciel, B.; Isanda Masaki, H.; Fortunate Mashaphu, M.; Masimula, L.; Prudent Mokgokong, S.; Katrin Onshuus, E.; Lima Paiva, B.; Parker-Allie, F.; Du Plessis, M.; Puzicha, M.; Gabriel Da Silva Solano Reis, O.; Speelman, G.; Moritz Splitthof, W.; Stocco de Lima, A. C.; Strindberg, H.; Smoge Saevik, O.; Tafjord, N. J. D

2026-03-30 ecology 10.64898/2026.03.27.714757 medRxiv
Top 0.1%
18.9%
Show abstract

Environmental DNA metabarcoding is a powerful monitoring tool for assessing aquatic biodiversity, as well as the sustainability and impacts of fisheries and aquaculture. However, conventional laboratory workflows remain time-consuming and dependent on dedicated infrastructures. Here, we present a field trial of a fully portable, off-grid eDNA metabarcoding pipeline that enables end-to-end analysis within a few days using compact equipment, including a BentoLab workstation and an Oxford Nanopore Technologies (ONT) MinION sequencer. The workflow was implemented during two international training courses in Norway and Brazil, where students and early career researchers collected environmental samples, extracted and amplified DNA, prepared DNA libraries, and sequenced on-site before performing bioinformatics and statistical analyses. In the case study detailed here, seven eDNA samples collected and analysed on-site in the Oslofjord allowed detection of 16 fish and elasmobranch species. Although overall diversity was lower than in earlier studies using Illumina-based sequencing, our protocol reliably detected key species and demonstrates that portable eDNA metabarcoding is feasible for rapid ecological assessment, surveillance of high-risk regions and/or deployment in remote or resourcelZllimited settings.

9
Reference genome choice impacts SNP recovery but not evolutionary inference in young species

Soares, L. S.; Goncalves, L. T.; Guzman-Rodriguez, S.; Bombarely, A.; Freitas, L. B.

2026-02-06 bioinformatics 10.64898/2026.02.04.703758 medRxiv
Top 0.1%
18.7%
Show abstract

Reduced-representation sequencing approaches such as RAD-seq are widely used in population genomics and phylogenetics, particularly for non-model organisms. However, bioinformatics choices during data processing can strongly influence downstream analyses. One key but underexplored factor is the reference genome used for read alignment and SNP discovery. Here, we evaluate the effects of reference genome choice on RAD-seq analyses using multiple datasets spanning recent radiations in Petunia and Calibrachoa, and reference genomes that differ in phylogenetic relatedness. When using congeneric reference genomes, we observed highly consistent mapping rates, SNP recovery, and downstream population genomic patterns. In contrast, mapping to more distantly related genomes resulted in lower mapping rates and stronger effects on summary statistics. Despite these quantitative reductions, broader patterns of genetic structure and diversity, as well as evolutionary relationships, remained largely congruent across reference genomes. Overall, our results indicate that reference genome choice matters most when genomes are distantly related or when analyses target fine-scale genomic signals. For recent radiations with largely conserved genome structure, closely related reference genomes yield comparable SNP datasets and lead to the same biological conclusions regarding population structure and phylogenetic relationships. These findings provide practical guidance for RAD-seq studies in non-model systems, showing that congeneric reference genomes are sufficient for robust population and phylogenetic inference, and that more distantly related genomes can remain informative when no close reference is available.

10
Epigenome-informed prioritization of bivalent chromatin SNPs enhances genomic prediction robustness: a proof-of-concept study in Pacific white shrimp (Litopenaeus vannamei)

Shi, J.; Lu, Z.; Sui, M.; Mu, M.; Zhang, D.; Bao, Z.; Hu, J.; Zeng, Q.; Ye, Z.

2026-02-17 genetics 10.64898/2026.02.14.705940 medRxiv
Top 0.1%
18.4%
Show abstract

BackgroundGenomic selection (GS) has revolutionized animal breeding, spanning livestock sectors such as pigs and cattle to aquatic species like fish and shrimp. However, its broader application across these industries is often constrained by high genotyping costs and reduced predictive reliability across divergent populations or generations. Developing cost-effective, biologically informed genotyping strategies to overcome these limitations remains a critical goal in animal agriculture. Epigenetic annotations, particularly histone modifications, provide direct functional insights into regulatory elements underlying complex trait variation and represent a promising but underexplored resource for marker prioritization. ResultsHere, using the Pacific white shrimp (Litopenaeus vannamei) as a model organism, we conducted a proof-of-concept study integrating resequencing and phenotypic data from 972 individuals. We generated high-resolution epigenomic maps by profiling four histone marks (H3K4me1, H3K4me3, H3K27me3, and H3K27ac) across multiple embryonic stages and adult muscle tissue using CUT&Tag. These functional annotations were then leveraged to prioritize single nucleotide polymorphism (SNP) subsets for genomic prediction. Among the tested strategies, SNPs located in the muscle-specific bivalent promoter/enhancer (E6) state--characterized by the co-occurrence of active and repressive marks--consistently maximized prediction accuracy under the BayesA model. Notably, even at a moderate density (15k), E6-derived SNPs achieved prediction accuracies exceeding those obtained using substantially larger genome-wide SNP sets. Most importantly, in a challenging cross-population validation using an independent strain, the E6-derived SNP subset significantly improved prediction accuracy by 47.6% (increasing from 0.21 {+/-} 0.05 to 0.31 {+/-} 0.04, p < 0.05) compared to random subsets at equivalent density. ConclusionsThese results demonstrate that epigenetic annotation-guided SNP prioritization provides a biologically informed and cost-effective strategy to enhance genomic prediction accuracy and stability. This framework is broadly transferable across species and offers a practical strategy for designing low-density genotyping panels that reduce costs while maintaining reliable selection outcomes in large-scale breeding programs.

11
Revisiting the genetics of Lake Constance Coregonids using lake-wide whole genome sequencing

Jacobs, A.; Roch, S.; Roberts, B.; Capstick, M.; Brinker, A.

2026-01-18 ecology 10.64898/2026.01.18.700192 medRxiv
Top 0.1%
18.0%
Show abstract

Anthropogenic pressures can have detrimental impacts on fish populations, with their effective management and conservation requiring accurate monitoring tools. Yet, this is not straightforward for closely-related, co-existing species that are difficult to distinguish using simple phenotypic or genetic approaches. Coregonids are of cultural and economic importance across Europe but have faced a multitude of pressures over the last century. Yet genomic management tools are lacking. In Lake Constance, a large pre-alpine lake, stocks have drastically collapsed due to a multitude of pressures, leading to a fishery closure. Here, we adopt a cost-effective, whole genome sequencing approach for lake-wide assessment of stock composition, spatial distribution and genetic diversity of highly admixed Lake Constance whitefish (Coregonus spp.). By sequencing 983 adult and larval genomes, we show that nearly 90% of the stock is made up by one of three species, the Gangfisch (C. macrophthalmus), and define the genetic relationship between Upper and Lower Lake Constance whitefish stocks. We also identified strong mixing between Gangfisch and Blaufelchen (C. wartmanni) on traditionally specific-specific spawning grounds, and detected strong admixture in larvae, with potentially drastic impacts on the effectiveness of hatchery supplementation and stocking. Despite the collapse and admixture, species still exhibit low to moderate levels of genetic diversity, maintain ecologically-relevant genetic differences, and seem to show differences in habitat use. Overall, we present a cost-effective, translatable tool for stock-wide sequencing and genetically-informed fisheries management, with our results calling for the re-evaluation of current management practices to avoid the potential genetic mixing between species.

12
Deciphering chromosome fusion in D. miranda's neo-sex chromosome through single-copy and repetitive oligo probes

Bruno, H.; Almeida, I.; D. Vibranovski, M.

2026-01-22 evolutionary biology 10.64898/2026.01.20.700498 medRxiv
Top 0.1%
17.9%
Show abstract

Drosophila miranda is considered an excellent model for studying sex chromosome evolution due to its neo-sex chromosomes, which originated from fusions between autosomes and sex chromosomes. In this study, we took advantage of the latest genome assembly of D. miranda to design the first oligo probe libraries targeting neo-sex chromosomes, covering X and Y-linked regions with times ranging from [~]1.5 to 60 million years. These libraries, which include both single-copy and repetitive oligos, were generated by integrating the OligoY approach to the conventional OligoMiner pipeline and validated through fluorescence in situ hybridization (FISH). We optimized oligo density and spacing parameters to predict consistent and effective chromosome painting. Beyond tool improvement, our mapping of the three largest unplaced Y-linked scaffolds in D. miranda reveals a complex evolutionary mechanism driving the current structure of the Y chromosome, including chromosomal translocation, centromere loss, and inversions. This work provides essential tools for sex chromosome identification via probe labeling and offers a foundation for exploring the spatial and evolutionary dynamics of sex chromosomes across different cell types. Author summaryWhile previous studies have focused on using single-copy oligonucleotides for chromosome painting, these oligos have limited effectiveness in targeting repetitive regions such as ribosomal DNA, pericentromeres, and mainly Y chromosomes. In this study, we integrated the OligoMiner and OligoY pipelines to design highly specific oligonucleotide libraries capable of targeting both single-copy and repetitive regions in any chromosome, enabling comprehensive painting of autosome and sex chromosomes. Using Drosophila miranda neo-sex chromosomes as a model, we validated the specificity of our oligo libraries through fluorescence in situ hybridization (FISH). Our results demonstrate that it is possible to achieve successful chromosome painting of sex chromosomes ranging from 1.5 to 60 million years old by combining single-copy and repetitive oligos, without compromising specificity. Notably, we painted the neo-Y chromosome of D. miranda and proposed a hypothesis to give rise to its current structure. This approach provides a powerful tool for studying chromosome evolution and organization, particularly in complex and repetitive genomic regions.

13
HapNet: a new Python package for automated population-aware haplotype network analysis and visualization

Davinack, A. A.

2026-02-19 evolutionary biology 10.64898/2026.02.18.706154 medRxiv
Top 0.1%
16.7%
Show abstract

Haplotype networks are widely used in population genetics and phylogeography to visualize genealogical relationships among DNA sequences and to infer population structure, historical connectivity, and demographic processes. Existing software for haplotype network construction relies primarily on interactive graphical interfaces, which limits reproducibility, automation, and integration into modern bioinformatic workflows. Here, I introduce HapNet, an open-source Python package that enables automated construction, visualization, and summarization of haplotype networks directly from aligned FASTA files. HapNet is the first Python-native package designed specifically for automated, population-aware haplotype network construction and visualization from aligned FASTA files. HapNet implements a minimum-spanning-tree approach based on Hamming distances among haplotypes and incorporates population metadata encoded in sequence headers to produce population-aware network visualizations in which shared haplotypes are represented as pie charts and node sizes scale with haplotype frequency. In addition to a publication-ready network, HapNet generates machine-readable tabular output describing haplotype composition, population membership, and shared versus private haplotypes, facilitating downstream statistical analysis and reproducibility. Here, HapNets utility is demonstrated using mitochondrial DNA sequences from the shell-boring polychaete worm Polydora neocaeca, illustrating how the software reveals patterns of population connectivity and haplotype sharing. HapNet provides a reproducible, scriptable alternative to existing graphical tools and is freely available via the Python Package Index and GitHub.

14
Assessing the potential of bee-collected pollen sequence data to train machine learning models for geolocation of sample origin

Hayes, R. A.; Kern, A. D.; Ponisio, L. C.

2026-04-01 bioinformatics 10.64898/2026.03.29.715128 medRxiv
Top 0.1%
14.9%
Show abstract

Pollen is a robust and widespread substance that captures a historical snapshot of a specific time and place, and it can be used to track movements through space by examining the pollen deposited on various objects. Palynology, the study of pollen, is used across fields such as conservation, natural history, and forensics, where it is particularly useful for tracing the origin and movement of objects. However, pollen has remained underutilized due to the difficulty of distinguishing many pollen taxa beyond the family level and limited pollen reference material to support location predictions. With recent developments in pollen DNA metabarcoding these issues have been rectified, but much of the available pollen data are primarily from wind-pollinated species, which are widespread and less informative of specific sample locations. Bee-collected pollen presents an untapped resource in training predictive models to geolocate sample origin. Here we compiled bee-collected pollen DNA sequence relative abundance data from three projects in the western U.S. and assessed the accuracy of supervised machine learning models to predict the location of sample origin based solely on pollen assemblage, without the need of incorporating additional data. Random Forest and k-Nearest Neighbors models yielded high accuracy across all projects. We also found that models trained on taxonomically clustered pollen assigned sequence variants (ASVs) performed slightly better than those trained on raw sequence data, but the difference was minor, indicating that models trained on raw sequence data can reliably predict location and avoid the time-consuming taxonomic assignment process. Our results demonstrate the utility of repurposing bee-collected pollen for geolocation and provide a framework for employing supervised machine learning in future geolocation efforts. HighlightsO_LIBee-collected pollen metabarcoding data was used to accurately predict sample origin C_LIO_LIRandom Forest and k-Nearest Neighbors algorithms were most accurate with lowest error C_LIO_LITaxonomically-classified and raw DNA sequence data training sets performed comparably C_LI

15
Inferring the number of spawning events from young-of-year genomic samples and otolith-derived birth dates: a richness-estimator perspective

Akita, T.; Yohei, T.; Hiroshige, T.

2026-01-21 ecology 10.64898/2026.01.19.700488 medRxiv
Top 0.1%
14.4%
Show abstract

Estimating the number of spawning events per female is key to understanding individual reproductive output in batch-spawning species, yet direct observation of spawning is often infeasible in the wild. Recent advances in genetic kinship inference enable the identification of maternal half siblings from young-of-the-year genomic samples, while otolith-based age determination provides reconstruction of offspring birth dates. Here we develop an offspring-based framework for estimating the number of clutches produced by individual females by integrating sibling structure inferred from genomic data with otolith-derived birth-date information. By recasting clutch identification as a richness estimation problem, we apply the Chao1 estimator to infer the total number of spawning events from incomplete offspring samples. Using simulation experiments, we evaluate how sampling effort and heterogeneity in clutch size influence clutch detection and estimation. Under uniform clutch-size distributions, modest numbers of offspring sampled per maternal family (10-20 offspring) yield accurate estimates of the total number of clutches, substantially outperforming naive counts of observed birth-date classes by recovering information from rare or unobserved spawning events. In contrast, skewed or multimodal clutch-size distributions lead to underestimation at low sample sizes, indicating that uneven reproductive output increases sampling effort required for reliable inference. Overall, our results demonstrate how offspring genomic data and otolith-derived birth dates can be jointly leveraged to reconstruct individual spawning histories under realistic sampling constraints. This perspective provides a framework for inferring within-season reproductive schedules in batch-spawning species, and highlights opportunities for integrating genomic and life-history data in fisheries monitoring and reproductive ecology.

16
Environmental DNA as an Indicator of Seasonal Reproductive Phenology in Freshwater Mussels

Marshall, N.; Dean, C.; Sierra, M.; Fleece, W. C.

2026-02-20 ecology 10.64898/2026.02.19.706874 medRxiv
Top 0.1%
14.3%
Show abstract

Unionid freshwater mussels exhibit a unique form of mitochondrial inheritance, termed doubly uniparental inheritance, in which a maternal and a paternal mitotype is transmitted uniparentally. The exclusive presence of a male mitotype in gonadal tissue and sperm cells suggests that environmental DNA (eDNA) could serve as a non-invasive method for monitoring freshwater mussel reproduction. Yet, the dynamics of male mitotype detection within the environment remain poorly understood. This study analyzed seasonal eDNA samples from two diverse mussel beds, detecting 24 mitochondrial operational taxonomic units (MOTUs) associated with the male mitotype. Peaks in male mitotype signal for mussels identifiable to the species level generally aligned with expected spawning periods based on female gravidity records (e.g., Pyganodon grandis, Lasmigona costata, Ortmaniana ligamentina). Additionally, male mitotype detection was often sporadic compared to the consistently detected female mitotype, indicating that male signals may be tied to behavioral or reproductive events rather than continuous shedding. While elevated male signals may reflect spawning, alternative sources such as tissue decay, mitotype leakage, glochidia release, or post-spawning gamete clearance complicate interpretation. A male-to-female mitotype ratio is proposed as a more reliable proxy for identifying sperm release events, given the high concentration of male mitotypes that occurs within spermatozeugmata. Limitations in male mitotype reference databases hindered species-level resolution for many MOTUs, underscoring the need for expanded genomic resources. Overall, this work demonstrates that male mitotype eDNA likely provides valuable insights into mussel reproductive ecology, while emphasizing the importance of long-term monitoring and integrated gametogenesis studies to refine its application in conservation.

17
eDNA metabarcoding provides scalable and continuous biodiversity monitoring across the tree of life

Macher, T.-H.; Schuetz, R.; Arle, J.; Beermann, A.; Haase, P.; Koschorrek, J.; Krehenwinkel, H.; Mora, D.; Sinclair, J. S.; Zimmermann, J.; Leese, F.

2026-02-14 ecology 10.64898/2026.02.12.705487 medRxiv
Top 0.1%
10.2%
Show abstract

Environmental DNA (eDNA) metabarcoding has the potential to substantially expand our knowledge of global biodiversity beyond that provided by conventional approaches. However, the degree to which eDNA data provides real ecological insight, rather than primarily reflecting environmental factors that affect eDNA shedding and degradation, remains unclear. Additional uncertainties arise in terms of cost-effectiveness and whether the price is worth any extra biodiversity information that is gleaned. Here, we established a high-resolution, bi-weekly eDNA time-series in Germany across aquatic and riparian habitats to quantify seasonal biodiversity dynamics, to relate eDNA to different potential environmental drivers, and to parametrize cost estimates. Over one year, eDNA metabarcoding detected more than 1,000 species across multiple trophic levels and primarily revealed real, taxon-specific seasonal patterns, in addition to some relationships to water temperature, discharge, and conductivity. Compared to historical records dating back to 1891, year-round eDNA monitoring increased reported species numbers by 2.4-fold for invertebrates, 2.2-fold for mammals, 1.7-fold for diatoms, 1.2-fold for fish and lamprey, and 1.03-fold for birds. Cumulative biodiversity estimates increased strongly with sampling frequency, demonstrating the value of eDNA for high-frequency time-series monitoring. Moreover, eDNA monitoring was highly cost-effective, providing more than twice the biodiversity information of many conventional surveys for one-sixth the cost, enabling scalable, high-resolution freshwater biodiversity assessments.

18
Environmental RNA improves detection and surveillance of schistosomiasis transmission

Mercier, C.; Douchet, P.; Pouzol, D.; Allienne, J.-F.; Caldeira, R. L.; Mourao, M. M.; Lima, M. G.; Montresor, L. C.; Blanchet, S.; Loot, G.; Rey, O.

2026-01-21 ecology 10.64898/2026.01.19.699857 medRxiv
Top 0.1%
9.9%
Show abstract

Environmental diagnosis of schistosomiasis, a tropical disease affecting more than 250 million people globally, is still challenging, which limits efficient eradication plans. There is a crucial need for resolutive and highly sensitive environmental tools to improve disease control. However, a major obstacle is the inability of current methods, including environmental DNA (eDNA), to distinguish infectious parasite life stages. Here, we develop and validate an environmental RNA (eRNA) approach that enables the detection and absolute quantification of Schistosoma mansoni miracidia and cercariae directly from water samples. After identifying life stage-specific transcripts, we design specific ddPCR assays which are strongly specific to S. mansoni and to each life stage. Comparing with eDNA assays, laboratory experiments demonstrate that stage-specific eRNA assays accurately reflect the relative abundance of each life stage, detect nucleic acids released from organisms, exhibit detection limits tenfold lower than eDNA assays. Also, under laboratory conditions, RNA persists in water longer than DNA. Field validation at six endemic sites in Brazil confirms that eRNA outperforms eDNA and conventional snail surveys in detecting active presence of S. mansoni cercariae, which indicates schistosomiasis transmission risk to humans locally. By enabling active-stage discrimination in environmental monitoring and improving sensitivity (compared to eDNA), our study advances both fundamental understanding and applied surveillance of schistosomiasis transmission, supporting elimination initiatives in affected regions.

19
SPrOUT: A computational and targeted sequencing approach for mixed plant DNA identification with Angiosperms353

Hu, N.; Bullock, M. R.; Jackson, C.; Miller, C.; Hunter, E.; Huff, C.; Chen, Y.; Handy, S.; Johnson, M.

2026-02-23 bioinformatics 10.64898/2026.02.20.707031 medRxiv
Top 0.1%
9.9%
Show abstract

PremiseThe identification of plant species from mixed samples is crucial in various fields, including ecological surveys, conservation efforts, and food and dietary supplement safety. Traditional methods face potential challenges due to the high costs of DNA sequencing, inefficiencies in computational workflows, and incomplete sequence databases. Methods and ResultsThis study introduces a novel approach using the Angiosperms353 target sequencing kit for efficient taxonomic identification of angiosperm DNA in mixed samples. Our method assembles short pair-end reads for each mixed sample. Using gene sets of Angiosperms353 from 871 species, we apply phylogenetic inference to categorize the variance in phylogenetic distance across genes to identify the presence of taxa in mixed plant samples. The pipeline reaches 98.1 to 99.6% accuracy, 92.9 to 100% precision for identifying unknown taxa in in-silico mixes, and 90.7% accuracy and 98.0% precision for mock supplement mixtures. We explored the parameter cutoffs of the pipeline to offer an empirical range for different applications. ConclusionsThe Angiosperms353 and HybPiper assembly proved effective in sorting mixed plant DNA samples. Our method offers a framework for scientific and practical applications in plant species identification in both single and mixed samples.

20
Novabrowse: A Tool for High-Resolution Synteny Analysis, Ortholog Detection, and Gene Signal Discovery

Rikk, L.; Ghaffarinia, A.; Leigh, N. D.

2026-03-30 genomics 10.64898/2026.03.27.714371 medRxiv
Top 0.1%
9.8%
Show abstract

Accurate genome annotation remains challenging as assembly quality often exceeds annotation reliability. Resolving ambiguities of gene presence, absence, and orthology typically requires integrating two complementary lines of evidence: sequence homology between species and the conservation of gene order (i.e., synteny). BLAST remains the standard for homology detection, yet its raw output can be difficult to interpret. Existing tools address this challenge but operate at opposing scales. Alignment viewers provide detailed pairwise statistics without genomic context, while synteny tools offer chromosome-scale perspectives without sequence-level resolution. To fill this intermediate gap, we developed Novabrowse, an interactive BLAST results interpretation framework featuring high-resolution multi-species synteny analysis, chromosomal re-arrangement investigation, ortholog detection, and gene signal discovery. Users define a genomic region of interest in a query species and/or use custom sequences, then select one or more subject species for comparison. The pipeline retrieves query gene sequences via NCBI API integration and performs BLAST searches against each subject transcriptome or genome. Results are presented via an interactive HTML file featuring alignment statistics, chromosomal maps, coverage visualizations, ribbon plots, and distance-based clustering of high-scoring segment pairs into putative gene units. We demonstrate these capabilities by investigating Foxp3, Aire, and Rbl1, three highly conserved vertebrate genes, in the recently assembled genome of the newt Pleurodeles waltl. Foxp3 and Aire have not been described in any salamander species to date, despite availability of multiple assemblies and extensive transcriptomic datasets. Using Novabrowse, we discovered conserved loci and gene signals for both genes in P. waltl, the presence of which was subsequently confirmed via Nanopore long-read RNA sequencing. In contrast, Rbl1 analysis uncovered a chromosomal rearrangement at its expected locus with no gene signal detected, indicating a gene loss specific to P. waltl despite the genes retention in the closely related axolotl (Ambystoma mexicanum). Our findings demonstrate Novabrowses capacity for evidence-based evaluation of annotation artifacts, an essential capability as high-quality assemblies become more available for phylogenetically diverse species. Novabrowse is open source (MIT license) and freely available at: https://github.com/RegenImm-Lab/Novabrowse.